Currently, most deep learning methods cannot solve the problem of scarcity of industrial product defect samples and significant differences in characteristics. This paper proposes an unsupervised defect detection algorithm based on a reconstruction network, which is realized using only a large number of easily obtained defect-free sample data. The network includes two parts: image reconstruction and surface defect area detection. The reconstruction network is designed through a fully convolutional autoencoder with a lightweight structure. Only a small number of normal samples are used for training so that the reconstruction network can be A defect-free reconstructed image is generated. A function combining structural loss and $\mathit{L}1$ loss is proposed as the loss function of the reconstruction network to solve the problem of poor detection of irregular texture surface defects. Further, the residual of the reconstructed image and the image to be tested is used as the possible region of the defect, and conventional image operations can realize the location of the fault. The unsupervised defect detection algorithm of the proposed reconstruction network is used on multiple defect image sample sets. Compared with other similar algorithms, the results show that the unsupervised defect detection algorithm of the reconstructed network has strong robustness and accuracy.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.
translated by 谷歌翻译
We present a strong object detector with encoder-decoder pretraining and finetuning. Our method, called Group DETR v2, is built upon a vision transformer encoder ViT-Huge~\cite{dosovitskiy2020image}, a DETR variant DINO~\cite{zhang2022dino}, and an efficient DETR training method Group DETR~\cite{chen2022group}. The training process consists of self-supervised pretraining and finetuning a ViT-Huge encoder on ImageNet-1K, pretraining the detector on Object365, and finally finetuning it on COCO. Group DETR v2 achieves $\textbf{64.5}$ mAP on COCO test-dev, and establishes a new SoTA on the COCO leaderboard https://paperswithcode.com/sota/object-detection-on-coco
translated by 谷歌翻译
Spiking Neural Networks (SNNs) have been studied over decades to incorporate their biological plausibility and leverage their promising energy efficiency. Throughout existing SNNs, the leaky integrate-and-fire (LIF) model is commonly adopted to formulate the spiking neuron and evolves into numerous variants with different biological features. However, most LIF-based neurons support only single biological feature in different neuronal behaviors, limiting their expressiveness and neuronal dynamic diversity. In this paper, we propose GLIF, a unified spiking neuron, to fuse different bio-features in different neuronal behaviors, enlarging the representation space of spiking neurons. In GLIF, gating factors, which are exploited to determine the proportion of the fused bio-features, are learnable during training. Combining all learnable membrane-related parameters, our method can make spiking neurons different and constantly changing, thus increasing the heterogeneity and adaptivity of spiking neurons. Extensive experiments on a variety of datasets demonstrate that our method obtains superior performance compared with other SNNs by simply changing their neuronal formulations to GLIF. In particular, we train a spiking ResNet-19 with GLIF and achieve $77.35\%$ top-1 accuracy with six time steps on CIFAR-100, which has advanced the state-of-the-art. Codes are available at \url{https://github.com/Ikarosy/Gated-LIF}.
translated by 谷歌翻译
许多古典童话,小说和剧本都利用对话来推进故事情节并建立角色。我们提出了第一个研究,以探索机器是否可以理解和产生故事中的对话,这需要捕获不同角色的特征及其之间的关系。为此,我们提出了两项​​新任务,包括蒙版对话生成和对话演讲者的认可,即分别产生对话转弯和预测说话者的指定对话转弯。我们构建了一个新的数据集拨号故事,该数据集由105K中国故事组成,其中包含大量对话,以支持评估。我们通过对拨号故事进行自动和手动评估测试现有模型来显示提出的任务的困难。此外,我们建议学习明确的角色表示,以提高这些任务的绩效。广泛的实验和案例研究表明,我们的方法可以产生更连贯和信息丰富的对话,并获得比强基础更高的说话者识别精度。
translated by 谷歌翻译
由于深度神经网络的开发,尤其是对于最近开发的无监督的JND代模型,对公正的显着差异(JND)建模做出了重大改进。但是,他们有一个主要的缺点,即在现实世界信号域而不是在人脑中的感知结构域中评估了生成的JND。当在这两个域中评估JND时,存在明显的差异,因为在现实世界中的视觉信号在通过人类视觉系统(HVS)传递到大脑之前已编码。因此,我们提出了一个受HVS启发的信号降解网络进行JND估计。为了实现这一目标,我们仔细分析了JND主观观察中的HVS感知过程,以获得相关的见解,然后设计受HVS启发的信号降解(HVS-SD)网络,以表示HVS中的信号降解。一方面,知识渊博的HVS-SD使我们能够评估感知域中的JND。另一方面,它提供了更准确的先验信息,以更好地指导JND生成。此外,考虑到合理的JND不应导致视觉注意力转移的要求,提出了视觉注意力丧失以控制JND的生成。实验结果表明,所提出的方法实现了SOTA性能,以准确估计HVS的冗余性。源代码将在https://github.com/jianjin008/hvs-sd-jnd上找到。
translated by 谷歌翻译
我们为深神经网络引入了两个低位训练后训练量化(PTQ)方法,该方法满足硬件要求,并且不需要长期重新训练。两次量化的能力可以将通过量化和去除化引入的乘法转换为许多有效加速器采用的位移位。但是,两次量表因子的候选值较少,这会导致更多的舍入或剪辑错误。我们提出了一种新型的两个PTQ框架,称为RAPQ,该框架被动态调整了整个网络的两个尺度,而不是静态地确定它们一层。从理论上讲,它可以权衡整个网络的舍入错误和剪辑错误。同时,RAPQ中的重建方法基于每个单元的BN信息。对Imagenet的广泛实验证明了我们提出的方法的出色性能。没有铃铛和哨声,REPQ在RESNET-18和MOBILENETV2上的准确度可以达到65%和48%,分别具有INT2激活INT4的精度。我们是第一个为低位PTQ提出更受限制但对硬件友好型的两次量化方案的人,并证明它可以达到与SOTA PTQ方法几乎相同的准确性。该代码已发布。
translated by 谷歌翻译
在只有有限的数据可用的低资源场景中,自然语言处理(NLP)的建立模型(NLP)具有挑战性。基于优化的元学习算法通过适应良好的模型初始化来处理新任务,从而在低资源场景中实现了有希望的结果。尽管如此,这些方法遭受了记忆过度拟合问题的困扰,在这种情况下,模型倾向于记住元训练任务,而在适应新任务时忽略了支持集。为了解决此问题,我们提出了一种内存模仿元学习(MEMIML)方法,该方法增强了模型对任务适应的支持集的依赖。具体来说,我们引入了一个特定于任务的内存模块来存储支持集信息并构建一个模仿模块,以强制查询集,以模仿存储在存储器中的某些代表性支持集样本的行为。提供了一种理论分析来证明我们方法的有效性,经验结果还表明,我们的方法在文本分类和生成任务上都优于竞争基准。
translated by 谷歌翻译
最近,已经研究了各种视图合成失真估计模型以更好地为3-D视频编码服务。然而,它们可以在不同水平的深度变化,纹理变性和视图合成失真(VSD)中数量地定量地模拟关系,这对于速率失真优化和速率分配至关重要。在本文中,开发了一种基于自动加权层表示的视图合成失真估计模型。首先,根据深度变化和它们相关的纹理变性,定义子VSD(S-VSD)。之后,一组理论衍生证明VSD可以大致分解成乘以其相关权重的S-VSD。为了获得S-VSD,开发了一种基于层的S-VSD表示,其中具有相同深度变化级别的所有像素用层表示,以在层级别实现高效的S-VSD计算。同时,学习非线性映射函数以准确地表示VSD和S-VSD之间的关系,在VSD估计期间自动为S-VSD提供权重。要了解此类功能,构建了VSD的数据集及其关联的S-VSD。实验结果表明,在其相关的S-VSD可用后,可以通过由非线性映射函数的重量进行准确地估计VSD。所提出的方法以准确性和效率优于相关的最先进方法。该方法的数据集和源代码将在https://github.com/jianjin008/处提供。
translated by 谷歌翻译